Add support for evals API #339

newokaerinasai · 2025-07-08T18:30:32Z

Have you read the Contributing Guidelines?

Sure

Describe your changes

This PR adds support for the following new features:

Evaluation API
CSV files (for Evaluation purposes)

mryab · 2025-07-16T14:55:43Z

src/together/cli/api/evaluation.py

+            )
+    elif type == "compare":
+        # Check if either model-a or model-b config/name is provided
+        model_a_provided = model_a_field or any(


Suggested change

model_a_provided = model_a_field or any(

model_a_provided = model_a_field is not None or any(

By the way, should this be any or all?

If it's "all" then in case of incomplete set of parameters it will fail with an error "model_an and model_b are required for compare evaluation" which is not a correct error explanation. In this case the only check is "at least something is present", more granularity is added below

src/together/cli/api/evaluation.py

mryab · 2025-07-16T15:14:57Z

src/together/resources/evaluation.py

+        )
+        parameters: Union[ClassifyParameters, ScoreParameters, CompareParameters]
+        # Build parameters based on type
+        if type == "classify":


Can we also check that the parameters which are not applicable for the evaluation type (e.g., model-a and model-b parameters for classify and score) are not passed? For instance, we could raise a ValueError if the user passed parameters that are not applicable for the evaluation type

Co-authored-by: Max Ryabinin <[email protected]>

mryab · 2025-07-21T14:00:02Z

src/together/cli/api/evaluation.py

+    if model_b_field is not None:
+        # Simple mode: model_b_field is provided
+        if any(model_b_config_params):
+            raise click.BadParameter(
+                "Cannot specify both --model-b-field and config parameters (--model-b-name, etc.). "
+                "Use either --model-b-field alone if your input file has pre-generated responses,\
+                    or config parameters if you generate it on our end"
+            )
+        model_b_final = model_b_field
+    elif any(model_b_config_params):
+        # Config mode: config parameters are provided
+        if not all(model_b_config_params):
+            raise click.BadParameter(
+                "All model config parameters are required when using detailed configuration: "
+                "--model-b-name, --model-b-max-tokens, --model-b-temperature, "
+                "--model-b-system-template, --model-b-input-template"
+            )
+        model_b_final = {
+            "model_name": model_b_name,
+            "max_tokens": model_b_max_tokens,
+            "temperature": model_b_temperature,
+            "system_template": model_b_system_template,
+            "input_template": model_b_input_template,
+        }


Is it possible to move these checks inside client.evaluation.create? Looks like right now we do not have strong validation for inputs if people use the Python SDK directly instead of the CLI

Makes sense. That's what I've done

…her-python into add_evals

newokaerinasai and others added 2 commits July 8, 2025 15:47

Add support for evals API

c392e08

Merge branch 'main' into add_evals

338686b

newokaerinasai requested review from VProv and mryab July 10, 2025 11:55

mryab reviewed Jul 16, 2025

View reviewed changes

newokaerinasai and others added 7 commits July 16, 2025 16:37

fix remark 1

5c27514

Co-authored-by: Max Ryabinin <[email protected]>

Fix grammar and naming

1c1d10e

Co-authored-by: Max Ryabinin <[email protected]>

remove default=None from argparse

91b2306

Update evaluation.py

e93672d

Update evaluation.py

0cf2f6d

typo fix from review

14760a9

Co-authored-by: Max Ryabinin <[email protected]>

Merge branch 'main' into add_evals

3efa810

mryab reviewed Jul 21, 2025

View reviewed changes

newokaerinasai added 5 commits July 21, 2025 15:05

code style

3bea214

Move validation from CLI to python client

a10cb54

Merge branch 'add_evals' of https://github.com/togethercomputer/toget…

54a713d

…her-python into add_evals

Code style

735cbe1

Update test

269384e

mryab approved these changes Jul 21, 2025

View reviewed changes

VProv approved these changes Jul 21, 2025

View reviewed changes

newokaerinasai marked this pull request as ready for review July 22, 2025 12:52

newokaerinasai merged commit 2e34944 into main Jul 22, 2025
7 of 10 checks passed

newokaerinasai deleted the add_evals branch July 22, 2025 12:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for evals API #339

Add support for evals API #339

Uh oh!

newokaerinasai commented Jul 8, 2025

Uh oh!

mryab Jul 16, 2025

Uh oh!

mryab Jul 16, 2025

Uh oh!

newokaerinasai Jul 16, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mryab Jul 16, 2025

Uh oh!

newokaerinasai Jul 16, 2025

Uh oh!

mryab Jul 21, 2025

Uh oh!

newokaerinasai Jul 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	model_a_provided = model_a_field or any(
	model_a_provided = model_a_field is not None or any(

Add support for evals API #339

Add support for evals API #339

Uh oh!

Conversation

newokaerinasai commented Jul 8, 2025

Describe your changes

Uh oh!

mryab Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

mryab Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

newokaerinasai Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mryab Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

newokaerinasai Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

mryab Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

newokaerinasai Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants